Indirect Cross-validation for Density Estimation
نویسندگان
چکیده
A new method of bandwidth selection for kernel density estimators is proposed. The method, termed indirect cross-validation, or ICV, makes use of so-called selection kernels. Least squares cross-validation (LSCV) is used to select the bandwidth of a selection-kernel estimator, and this bandwidth is appropriately rescaled for use in a Gaussian kernel estimator. The proposed selection kernels are linear combinations of two Gaussian kernels, and need not be unimodal or positive. Theory is developed showing that the relative error of ICV bandwidths can converge to 0 at a rate of n−1/4, which is substantially better than the n−1/10 rate of LSCV. Interestingly, the selection kernels that are best for purposes of bandwidth selection are very poor if used to actually estimate the density function. This property appears to be part of the larger and well-documented paradox to the effect that “the harder the estimation problem, the better cross-validation performs.” The ICV method uniformly outperforms LSCV in a simulation study, a real data example, and a simulated example in which bandwidths are chosen locally.
منابع مشابه
An Empirical Study of Indirect Cross-validation
In this paper we provide insight into the empirical properties of indirect crossvalidation (ICV), a new method of bandwidth selection for kernel density estimators. First, we describe the method and report on the theoretical results used to develop a practical-purpose model for certain ICV parameters. Next, we provide a detailed description of a numerical study which shows that the ICV method u...
متن کاملPenalized Likelihood Density Estimation: Direct Cross-Validation and Scalable Approximation
For smoothing parameter selection in penalized likelihood density estimation, a direct crossvalidation strategy is illustrated. The strategy is as effective as the indirect cross-validation developed earlier, but is much easier to implement in multivariate settings. Also studied is the practical implementation of certain low-dimensional approximations of the estimate, with the dimension of the ...
متن کاملBandwidth selection in marker dependent kernel hazard estimation
Practical estimation procedures for local linear estimation of an unrestricted failure rate when more information is available than just time are developed. This extra information could be a covariate and this covariate could be a time series. Time dependent covariates are sometimes called markers, and failure rates are sometimes called hazards, intensities or mortalities. It is shown through s...
متن کامل@bullet a Comparison of Cross-validation Techniques in Density Estimation! (comparison in Density Estimation)
• • ~~~~~~ In the setting of nonparametric multivariate density estimation, theorems are established which allow a comparison of the Kullback-Leibler and the Least Squares cross-validation methods of smoothing parameter selection. The family of delta sequence estimators (including kernel, orthogonal series, histogram and histospline estimators) is considered. These theorems also show that eithe...
متن کاملCross -Validation of Multivariate Densities
In recent years, the focus of study in smoothing parameter selection for kernel density estimation has been on the univariate case, while multivariate kernel density estimation has been largely neglected. In part, this may be due to the perception that calibrating multivariate densities is substantially more diicult. In this paper, we explicitly derive and compare multivariate versions of the b...
متن کامل